Kernel embedding of distributions について

Words near each other

・ Kerne Bridge railway station
・ Kernel
・ Kernel (algebra)
・ Kernel (category theory)
・ Kernel (digital media company)
・ Kernel (EP)
・ Kernel (image processing)
・ Kernel (linear algebra)
・ Kernel (operating system)
・ Kernel (set theory)
・ Kernel (statistics)
・ Kernel adaptive filter
・ Kernel debugger
・ Kernel density estimation
・ Kernel eigenvoice
・ Kernel embedding of distributions
・ Kernel Fisher discriminant analysis
・ Kernel function for solving integral equation of surface radiation exchanges
・ Kernel Independent Transport Layer
・ Kernel marker
・ Kernel method
・ Kernel methods for vector output
・ Kernel Normal Form
・ Kernel panic
・ Kernel patch
・ Kernel Patch Protection
・ Kernel perceptron
・ Kernel preemption
・ Kernel principal component analysis
・ Kernel random forest

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

Kernel embedding of distributions ：ウィキペディア英語版

Kernel embedding of distributions
In machine learning, the kernel embedding of distributions (also called the kernel mean or mean map) comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS).〔A. Smola, A. Gretton, L. Song, B. Schölkopf. (2007). (A Hilbert Space Embedding for Distributions ). ''Algorithmic Learning Theory: 18th International Conference''. Springer: 13–31.〕 A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis.〔L. Song, K. Fukumizu, F. Dinuzzo, A. Gretton (2013). (Kernel Embeddings of Conditional Distributions: A unified kernel framework for nonparametric inference in graphical models ). ''IEEE Signal Processing Magazine'' 30: 98–111.〕 This learning framework is very general and can be applied to distributions over any space

\Omega

on which a sensible kernel function (measuring similarity between elements of

\Omega

) may be defined. For example, various kernels have been proposed for learning from data which are: vectors in

\mathbb^d

, discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects.〔J. Shawe-Taylor, N. Christianini. (2004). ''Kernel Methods for Pattern Analysis''. Cambridge University Press, Cambridge, UK.〕〔T. Hofmann, B. Schölkopf, A. Smola. (2008). (Kernel Methods in Machine Learning ). ''The Annals of Statistics'' 36(3):1171–1220.〕 The theory behind kernel embeddings of distributions has been primarily developed by (Alex Smola ), (Le Song ), (Arthur Gretton ), and Bernhard Schölkopf.
The analysis of distributions is fundamental in machine learning and statistics, and many algorithms in these fields rely on information theoretic approaches such as entropy, mutual information, or Kullback–Leibler divergence. However, to estimate these quantities, one must first either perform density estimation, or employ sophisticated space-partitioning/bias-correction strategies which are typically infeasible for high-dimensional data.〔L. Song. (2008) (Learning via Hilbert Space Embedding of Distributions ). PhD Thesis, University of Sidney.〕 Commonly, methods for modeling complex distributions rely on parametric assumptions that may be unfounded or computationally challenging (e.g. Gaussian mixture models), while nonparametric methods like kernel density estimation (Note: the smoothing kernels in this context have a different interpretation than the kernels discussed here) or characteristic function representation (via the Fourier transform of the distribution) break down in high-dimensional settings.〔
Methods based on the kernel embedding of distributions sidestep these problems and also possess the following advantages:〔
# Data may be modeled without restrictive assumptions about the form of the distributions and relationships between variables
# Intermediate density estimation is not needed
# Practitioners may specify the properties of a distribution most relevant for their problem (incorporating prior knowledge via choice of the kernel)
# If a ''characteristic'' kernel is used, then the embedding can uniquely preserve all information about a distribution, while thanks to the kernel trick, computations on the potentially infinite-dimensional RKHS can be implemented in practice as simple Gram matrix operations
# Dimensionality-independent rates of convergence for the empirical kernel mean (estimated using samples from the distribution) to the kernel embedding of the true underlying distribution can be proven.
# Learning algorithms based on this framework exhibit good generalization ability and finite sample convergence, while often being simpler and more effective than information theoretic methods
Thus, learning via the kernel embedding of distributions offers a principled drop-in replacement for information theoretic approaches and is a framework which not only subsumes many popular methods in machine learning and statistics as special cases, but also can lead to entirely new learning algorithms.
==Definitions==

Let

X

denote a random variable with codomain

\Omega

and distribution

P(X)

. Given a kernel

k

\Omega \times \Omega

, the Moore-Aronszajn Theorem asserts the existence of a RKHS

\mathcal

(a Hilbert space of functions

f: \Omega \mapsto \mathbb

equipped with inner products

\langle \cdot, \cdot \rangle_\mathcal

and norms

|| \cdot ||_\mathcal

) in which the element

\ k(x,\cdot)

satisfies the reproducing property

\langle f, k(x,\cdot) \rangle_\mathcal = f(x) \ \forall f \in \mathcal, \forall x \in \Omega

. One may alternatively consider

\ k(x,\cdot)

an implicit feature mapping

\phi(x)

from

\Omega

\mathcal

(which is therefore also called the feature space), so that

\ k(x, x') = \langle \phi(x), \phi(x')\rangle_\mathcal

can be viewed as a measure of similarity between points

x, x' \in \Omega

. While the similarity measure is linear in the feature space, it may be highly nonlinear in the original space depending on the choice of kernel.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Kernel embedding of distributions」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース